Open
Description
Description
I am trying to scrape realestate.com.au website but whenever i go to the second page, it shows undefined. I am suspecting I am labelled as a bot. Is there anything I could do to bypass?
Full steps to reproduce the issue
after connecting, go to https://www.realestate.com.au/sold/in-5000/list-6 and attempts to go to https://www.realestate.com.au/sold/in-5000/list-7 again using the same page.
Issue Type
Others
Operating System
Other
Do you use Docker?
I don't use Docker
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
RubberArchind commentedon Jan 10, 2025
can you show your configuration?
delvin02 commentedon Jan 10, 2025
Initially i was trying to open up the browser each time the scrap is finished per page... It does not work well either. happy for you to try it out!
import {
BASE_URL,
SOLD_LISTING__PATH,
PROPERTY_LISTING_RESULT__CLASS,
PROPERTY_LISTING_CONTENT__CLASS,
ADDRESS__CLASS,
SOLD_PRICE_TAG__CLASS,
PROPERTY_LINK__CLASS,
} from "../../constants/realestate";
import type { IScraper, PropertyDetail } from "../@interfaces";
import { ChalkLogger } from "../helper/chalk-logger";
import { getListLinkPath, getPostCodeLinkPath } from "../helper/realestate";
import { connect } from "puppeteer-real-browser";
import { promises as fs } from "fs";
import { join, resolve } from "path";
import Papa from "papaparse";
import { RotationalProxy } from "../proxy/proxy";
export class RealEstateScraper implements IScraper {
public name = "realestate";
private readonly logger = new ChalkLogger();
private readonly batchSize = 1000;
private rotationalProxy: RotationalProxy;
constructor() {
this.rotationalProxy = new RotationalProxy();
}
async scrape(postcode: string): Promise {
let allResults: PropertyDetail[] = [];
let batchNumber: number = 1;
}
delvin02 commentedon Jan 11, 2025
Sometimes, when I direct my page to the website. The UI clearly displayed on the screen. For some reason, I couldn't access the dom.
RubberArchind commentedon Jan 11, 2025
when it in the state like "couldn't access the dom" , can you interact with the page manually?
captainjackrana commentedon Jan 15, 2025
try removing flags like
--start-maximizedand--window-size.Set the defaultViewPort as null